1 Introduction

miMeta is a R package and it implements the new methods for meta-analysis of microbiome association studies that respect the unique features of microbiome data such as compositionality. This tutorial gives examples of applying a meta-analysis method called Melody [1] for microbial signature selection. Melody first generates summary statistics for each study and then uses the summary statistics across studies to select microbial signatures.

2 Installation

Install package from GitHub.

# if(!require("miMeta", quietly = TRUE)){
#   devtools::install_github("ZjpWei/miMeta")
# }

Load the packages.

library("miMeta")
library("tidyverse")

3 Meta-analysis given all samples of all participating studies

There are two approaches miMeta can run meta-analysis.

Approach #1: If the individual-level sample data across all participating studies are available, the function melody can take these data as input to generate, harmonize, and combine summary association statistics across studies for accurate and robust identification of microbial signatures.

Approach #2: If individual-level samples of each study are not in a central place, summary statistics can be generated for each study separately using the functions melody.null.model and melody.get.summary. The summary statistics can be transported to a central place to be harmonized and combined for signature selection using the function melody.meta.summary.

We demonstrate Approach #1 in this session and Approach #2 in the next session.

Here we use the datasets from two metagenomics studies of colorectal cancer (CRC) [2] to demonstrate the use of each function. The CRC_abd is a list of sample-by-feature matrices of relative abundance counts of 267 species under order Clostridiales from the two studies. The CRC_meta is a data frame including the sample-level variables from these two studies. In particular, the following variables are in the CRC_meta data:

  • Sample identity: “Sample_ID”

  • Study name: “Study”

  • Disease status: “Group”

data("CRC_data")
CRC_abd <- CRC_data$CRC_abd
CRC_meta <- CRC_data$CRC_meta
# Prepare input data
rel.abd <- list()
covariate.interest <- list()
for(d in c("FR-CRC", "DE-CRC")){
  rel.abd[[d]] <- CRC_abd[CRC_meta$Sample_ID[CRC_meta$Study == d],]
  disease <- as.numeric(CRC_meta$Group[CRC_meta$Study == d] == "CRC")
  names(disease) <- CRC_meta$Sample_ID[CRC_meta$Study == d]
  covariate.interest[[d]] <- data.frame(disease = disease)
}

refs <- c("Coprococcus catus [ref_mOTU_v2_4874]", "Coprococcus catus [ref_mOTU_v2_4874]")
names(refs) <- c("FR-CRC", "DE-CRC")
meta.result <- melody(rel.abd = rel.abd, covariate.interest = covariate.interest, 
                      ref = refs, 
                      verbose = TRUE)
## ++ Construct summary statistics. ++
## ++ Construct summary statistics for study FR-CRC and covariate of interest disease. ++
## ++ Construct summary statistics for study DE-CRC and covariate of interest disease. ++
## ++ Search for the best model for covariate of interest disease. ++

The first figure above shows the number of features shared among studies (here, all 266 species are present in both studies), and the second figure shows the meta-coefficient estimates for the selected microbial signatures for CRC.

The following shows 20 coefficient estimates of the microbial features in the best model (the features with non-zero coefficients are the selected signatures):

head(data.frame(coef = meta.result$disease$coef), n = 20)
##                                                           coef
## Anaerostipes caccae [ref_mOTU_v2_1381]                0.000000
## Anaerostipes hadrus [ref_mOTU_v2_1309]                0.000000
## Anaerotruncus colihominis [ref_mOTU_v2_0884]          0.000000
## Blautia hydrogenotrophica [ref_mOTU_v2_4324]          0.000000
## Blautia obeum [ref_mOTU_v2_4202]                      0.000000
## Blautia obeum [ref_mOTU_v2_4719]                      0.000000
## Blautia producta [ref_mOTU_v2_4020]                   0.000000
## Blautia sp. KLE 1732 [ref_mOTU_v2_0859]               0.000000
## Blautia wexlerae [ref_mOTU_v2_0466]                   0.000000
## butyrate-producing bacterium SS3/4 [ref_mOTU_v2_3825] 0.000000
## Butyrivibrio crossotus [ref_mOTU_v2_4383]             0.000000
## Clostridiales bacterium VE202-14 [ref_mOTU_v2_2689]   0.000000
## Clostridium asparagiforme [ref_mOTU_v2_4394]          0.000000
## Clostridium boltae/clostridioforme [ref_mOTU_v2_0886] 1.288898
## Clostridium citroniae [ref_mOTU_v2_4882]              0.000000
## Clostridium clostridioforme [ref_mOTU_v2_0979]        0.000000
## Clostridium leptum [ref_mOTU_v2_4234]                 0.000000
## Clostridium saccharolyticum [ref_mOTU_v2_1380]        0.000000
## Clostridium scindens [ref_mOTU_v2_0883]               0.000000
## Clostridium sp. AT4 [meta_mOTU_v2_7263]               0.000000

By default, Melody only outputs the information about the best model. If users want to see the information about all models on the search path of subset size, set output.best.one = FALSE. See help(melody) for more details.

4 Meta-analysis given samples of one study at a time

If the datasets of different studies live in different locations and cannot be conveniently shared among studies, we can first generate summary statistics for each study:

# Generate summary statistics for each study
null.obj.FR <- melody.null.model(rel.abd = rel.abd["FR-CRC"], ref = "Coprococcus catus [ref_mOTU_v2_4874]")
summary.stats.FR <- melody.get.summary(null.obj = null.obj.FR,
                                       covariate.interest = covariate.interest["FR-CRC"])

null.obj.DE <- melody.null.model(rel.abd = rel.abd["DE-CRC"], ref = "Coprococcus catus [ref_mOTU_v2_4874]")
summary.stats.DE <- melody.get.summary(null.obj = null.obj.DE,
                                       covariate.interest = covariate.interest["DE-CRC"])

These summary statistics can be transported to a central location for meta-analysis:

# Concatenate summary statistics
summary.stats.all <- c(summary.stats.FR, summary.stats.DE)
# Meta-analysis to harmonize and combine summary statistics across studies
meta.result.2 <- melody.meta.summary(summary.stats = summary.stats.all, verbose = TRUE)
## ++ Search for the best model for covariate of interest disease. ++

The meta-analysis results generated this way are identical to those generated in the previous session.

5 Meta-analysis of large-scale association scan

Microbiome association studies can involve a large number of covariates of interest (e.g., omics variables). We show here how to use Melody to meta-analyze eight microbiome-metabolome association studies [3]. In these eight studies, we have 101 genera and 450 metabolites and we are interested in identifying genera that are associated with individual metabolites. This analysis takes approximately 20 minutes.

# Load metabolome data: 
# You may see the following website on how to directly load data from 
# github into R https://github.com/ZjpWei/Melody/data/

load("~/Documents/Researches/Package_build/miMeta/metabolome_data.Rdata")
otu_data_lst <- metabolome_data$otu_data_lst
cmpd_data_lst <- metabolome_data$cmpd_data_lst
covariates_adjust_lst <- metabolome_data$covariates_adjust_lst
cluster_data_lst <- metabolome_data$cluster_data_lst

# get null model
null.obj <- melody.null.model(rel.abd = otu_data_lst, covariate.adjust = covariates_adjust_lst)

# Get summary statistics
summary.stats <- melody.get.summary(null.obj = null.obj, covariate.interest = cmpd_data_lst, cluster = cluster_data_lst)

# Meta-analysis
meta.scan.result <- melody.meta.summary(summary.stats = summary.stats, verbose = TRUE)

The following shows 20 coefficient estimates of the microbial features for 5 metabolites.

selected.num <- sort(unlist(lapply(meta.scan.result, function(d){sum(d$coef!=0)})), decreasing = TRUE)
top.cov.name <- names(selected.num)[1:min(5, length(selected.num))]
coef_mat <- sapply(meta.scan.result[top.cov.name], function(d){d$coef}, simplify = TRUE)
rownames(coef_mat) <- gsub(".*;g__","g__",rownames(coef_mat))
head(coef_mat, n = 20)
##                      HMDB0000792 HMDB0000619 HMDB0002064 HMDB0003355
## g__Bifidobacterium     0.0000000   0.0000000  0.00000000   0.0000000
## g__Collinsella         0.0000000   0.0000000  0.00000000   0.1392989
## g__Eggerthella         0.0000000   0.0000000  0.00000000   0.0000000
## g__Alloprevotella      0.0000000   0.0000000  0.00000000   0.0000000
## g__Bacteroides         0.0000000   0.0000000  0.00000000   0.0000000
## g__Paraprevotella      0.0000000   0.0000000  0.00000000   0.0000000
## g__Phocaeicola        -0.1031211   0.0000000  0.06025786   0.1424695
## g__Prevotella          0.0000000   0.0000000  0.00000000   0.0000000
## g__Barnesiella         0.2254599  -0.1733819 -0.24018615  -0.3651703
## g__Butyricimonas       0.1433688  -0.1044560 -0.13233804  -0.1906577
## g__Odoribacter         0.1664671  -0.1320896 -0.14932793  -0.2999097
## g__Alistipes           0.2430754  -0.1869777 -0.24067746  -0.3775380
## g__Parabacteroides     0.0000000   0.0000000  0.00000000   0.0000000
## g__Cryptobacteroides   0.3851180  -0.1400692 -0.18922028   0.0000000
## g__Bilophila           0.0000000   0.0000000  0.00000000   0.0000000
## g__PeH17               0.5338486  -0.3088130 -0.30735239  -0.5213579
## g__UBA11524            0.3895136  -0.1816429 -0.18314908  -0.5748607
## g__Clostridium        -0.2424887   0.0000000  0.10112143   0.3793442
## g__Anaerotignum        0.0000000  -0.0711810  0.00000000   0.0000000
## g__Acetatifactor       0.2434552  -0.1400899 -0.16716833  -0.2795134
##                      HMDB0000036
## g__Bifidobacterium    0.00000000
## g__Collinsella        0.00000000
## g__Eggerthella        0.00000000
## g__Alloprevotella     0.00000000
## g__Bacteroides        0.00000000
## g__Paraprevotella     0.00000000
## g__Phocaeicola        0.00000000
## g__Prevotella         0.00000000
## g__Barnesiella       -0.14879099
## g__Butyricimonas     -0.09451219
## g__Odoribacter       -0.11556537
## g__Alistipes         -0.17085423
## g__Parabacteroides    0.00000000
## g__Cryptobacteroides -0.08414073
## g__Bilophila         -0.05751673
## g__PeH17             -0.25975618
## g__UBA11524          -0.17732752
## g__Clostridium        0.00000000
## g__Anaerotignum       0.00000000
## g__Acetatifactor     -0.08792045

6 Session information

sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.3.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Chicago
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] lubridate_1.9.2  forcats_1.0.0    stringr_1.5.0    dplyr_1.1.4     
##  [5] purrr_1.0.1      readr_2.1.4      tidyr_1.3.0      tibble_3.2.1    
##  [9] ggplot2_3.4.4    tidyverse_2.0.0  miMeta_0.1.0     BiocStyle_2.28.0
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.7            utf8_1.2.4            generics_0.1.3       
##  [4] stringi_1.7.12        enrichwith_0.3.1      lattice_0.21-8       
##  [7] hms_1.1.3             digest_0.6.33         magrittr_2.0.3       
## [10] timechange_0.2.0      evaluate_0.21         grid_4.3.1           
## [13] brglm2_0.9.2          bookdown_0.35         iterators_1.0.14     
## [16] fastmap_1.1.1         foreach_1.5.2         doParallel_1.0.17    
## [19] plyr_1.8.9            jsonlite_1.8.7        Matrix_1.6-0         
## [22] nnet_7.3-19           gridExtra_2.3         BiocManager_1.30.21.1
## [25] fansi_1.0.6           scales_1.3.0          UpSetR_1.4.0         
## [28] codetools_0.2-19      numDeriv_2016.8-1.1   jquerylib_0.1.4      
## [31] cli_3.6.2             rlang_1.1.3           munsell_0.5.0        
## [34] withr_3.0.0           cachem_1.0.8          yaml_2.3.7           
## [37] tools_4.3.1           parallel_4.3.1        tzdb_0.4.0           
## [40] colorspace_2.1-0      abess_0.4.8           vctrs_0.6.5          
## [43] R6_2.5.1              lifecycle_1.0.4       MASS_7.3-60          
## [46] pkgconfig_2.0.3       pillar_1.9.0          bslib_0.5.0          
## [49] gtable_0.3.4          glue_1.7.0            Rcpp_1.0.12          
## [52] highr_0.10            xfun_0.39             tidyselect_1.2.0     
## [55] rstudioapi_0.15.0     knitr_1.43            farver_2.1.1         
## [58] htmltools_0.5.5       labeling_0.4.3        rmarkdown_2.23       
## [61] compiler_4.3.1

7 Reference

  1. Wei Z, Chen G, Tang ZZ. Melody identifies generalizable microbial signatures in microbiome association meta-analysis. Submitted.

  2. Wirbel, Jakob et al. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer.

  3. Muller, E., Algavi, Y.M. & Borenstein, E. The gut microbiome-metabolome dataset collection: a curated resource for integrative meta-analysis. npj Biofilms Microbiomes 8, 79 (2022).